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ABSTRACT 

A model of the child's learning of the past tense 
forms of English verbs is discussed. This connectionist model takes 
as input a present-tense verb and provides as output a past tense 
form. A new simulation is applied to 13 problems raised by critics of 
the model, presented as fundamental flaws in the conceptualizations 
underlying connectionism. The new simulation uses a new input 
representation based on feature/slot units and two views on input 
words, a new architecture using identity mappings and hidden units, a 
new learning algorithm, and an input corpus that includes all five 
cells of the English verb paradigm. Together the changes led to a 
vast improvement in the model's performance. All problems but those 
dealing' with the issue of direct access within connectionist nets 
were addressed successfully. It is concluded that the connectionist 
models are extremely useful way? of characterizing the learning of 
inflectional systems, and that the critiques erred by confusing 
conceptualizations of the model with implementations. (MSE) 
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Implementations are not Conceptualizations: 
Revising the Verb Learning Model 

Brian MacWhinney 
Jaitd Leinbach 
Carnegie Mellon University 



. A recent issue of Cognition was devoted to an in-depth criticism of the connectionist 

agenda. At the focus of much of this criticism was a PDP simulation by Rumelhart and 
^ McClelland (1986) of the acquisition of the phonological form of the English past tense. A 

^ lengthy article by Pinker and Prince ( 1988) was devoted entirely to the critique of die R&M 

T[ verb learning model. A somewhat shorter article by Lachter and Be ver (1988) devoted 

^ most of its pages to the R&M verb learning model, while also critiquing four other 

CC connectionist models in somewhat less detail. The chapter tiiat served as the target of this 

extensive criticism had as its goal the formulation of an computationally explicit 
connectionist model of the child's learning of the past tense forms of English verbs. This 
tf5 "verb-leaming" model of Rumelhart and McQelland (henceforth R&M) was capable of 

Ca3 taking as input a present tense verb, such as "ring," and providing as output a past tense 

form, such as "rang." It did this without any overt encoding of a set of rules and witiiout 
any formal construction of morphological paradigms. 

ITie arguments against this model presented by Pinker and Prince (henceforth P&P) and 
Lachter and Bcver (henceforth L&B) were well-constructed and thoughtfully developed, 
and their critique has generated a great deal of useful discussion. Moreover, there is good 
reason to agree witii many of die detailed aspects of the two critiques. However, these 
authors are mistaken in thinking that their criticisms call into question the general 
connectionist conceptualization, rather than merely casting doubt on a particular 
connectionist implementation. In tiiis regard, it is crucial to note that P&P believed that 
their analysis was quite profound (p. 82). 

"We will conclude that the claim tiiat parallel distributed processing networks can 
eliminate die need for rules and for rule induction mechanisms in the explanation 
of human language is unwarranted. In particular, we argue that the shortcomings 
are in many cases due to central features of connectionist ideology and 
irremediable; or if remediable, only by copying tenets of the maligned symbolic 
theory. The implications for the promise of connectioriism in explicating language 
are, we think, profound." 

L&B also believed that they had isolated a fundamental weakness in all connectionist 
approaches to language learning (p. 243). 

"The connectionist models we have considered arrive at rule-like regularities in 
1 language behavior only insofar as the models already contain architectures and 

(N devices explained in humans by mental representations of categorical rules." 

r~ In this paper, we will consider whether these analyses are as profound as their authors 

claimed them to be. 
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1. Tha CrltlclsiTYS 

P&P and L&B argued that claims made by K&M regarding the success of \he Vwib learning 
model are extremely misleading. They expressed their respective cases against the verb 
learning model in tenr.s of a set of points. P&P (p. 81) presented at least ten specific 
problems which they l)elieved were incorrectly addressed by the R&M model. These 
problems are repeated l)elow in quotation marks. Numbers and problem names have been 
added for ease of reference. 

1. The u-shapeil learning problem: "Rumelhart and McClelland's actual 
explanation of children's stages of regularization of the past tense morpheme is 
demonstrably inconecL" 

2. The "ated** riroblem: 'Their explanation for one striking type of childhood 
speech error is also incorrect." 

3. The "hit-hit*' problem: "Their other apparent successes in accounting for 
developments! phenomena either have nothing to do with the model's parallel 
distributed processing architecture, and can easily be duplicated by symbolic 
models, or involve major confounds and hence do not provide clear support for the 
model." 

4. The "algalgiil** problem: "The model is incapable of representing certain kinds 
of words." 

5. The **sUt-sil't** problem: "It is incapable of explaining panems of psychological 
similarity anong words." 

6. The "brag-iP'ab** problem: '*It easily models many kinds of rules that are not 
found in any human language. 

7. The phonological regularities problem: "It fails to capture central 
generalizations about English sound patterns. It makes false predictions about 
derivational morphology, compounding and novel words." 

8. The homophony problem: "It cannot handle the elementary problem of 
homophony." 

9. The convergence problem: "It makes errors in computing the past tense forms 
of a large percentage of the words it is tested on." 

10. The regular pattern problem: "It makes incorrect predictions about the reality 
of the distinction between regular rules and exceptions in children and in 
languages." 

Problems 1,4, 5, 7, and 10 were also raised by L&B. In addition, L&B provided a 
second form of criticism that is quite different from that developed by P&P. L&B claimed 
that the R&M verb learning model achieved much of its success by using a variety of 
TRIGS (The Representations It Crucially Supposes). They believed that, togetiier, these 
TRIGS led to a cryptoembodiment of rules with the connectionist net. These TRIGS all 
relate to the design and interpretation of the Wickelfeatures in the verb learning model. 
L&B complained that the selection of 460 particular Wickelfeatures from a possible set of 
about 2000 involved a variety of decisions that tended to reconstitute traditional segmental 
phonemic information. L&B used this observation to argue that connectionist architectures 
necessarily contain cryptoembodiments of rules. 
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1 1. The cryptorule problem: The selective development of an input representation 
can lead to the cryptoembodiment of rules in PDP nets. 

There are two addirional problems with the R&M verb learning model that were not raised 
by either ?&P and L&B. 

12. The early noise problem: In the R&M model, many epochs of training were 
required before the model would output phonological forms close to those of 
recognizable words. There is indeed an early period in child phonology when 
words have a very indistinct shape. However, when children are working out the 
various forms of the verb, most of these phonological limitations have been 
overcome. To be sure, ^arly past tenses include many overregularizations and 
mismarkings, but these errors are typically modifications of the basic form of the 
verb, rather than phonologically inarticulate forms. 

13. The direct access problem: The R&M model works by converting one 
phonological representation into another phonological representation. This mode of 
access takes a "basic" form and uses it to find a "derived" fomi. This is certainly 
one of the ways in which we can access the correct past tense of a verb. However, 
we can also access words directly through meaning. It is not yet clear how 
connectionist models can simulate direct access in a theoretically interesting way. 



2. The new simulation 

The new simulation uses a new input representation, a new output representation, a new 
network architecture, a new learning algorithm, and a new input corpus. Each word is 
represented by a trisyllaliv-; left-justified pattern and a monosyllabic coda in a right-justified 
pattern. The trisyllabic pattern takes the form CCCVVCCCVVCCCVVCCC, where C 
stands for consonant and V stands for vowel. The coda pattern takes the form VVCCC. 
The way in which sounds are filled into these slots is explained below. Vowel nuclei are 
composed of up to two segments and consonantal clusters are composed of up to three 
segments. If a particular segment is not actually present in a word, its features are simplv 
left off. ^ ^ 

The 14 vowels of English are represented by eight distinctive features. Since there are six 
possible vowel slots, there are a total of 48 units dedicated to vowels. The 22 consonants 
of English are represented by ten t istinctive features. Since there are twelve possible 
consonantal slots, there are a tot; ' of 120 units dedicated to consonants. Together, the 120 
consonantal units and the 48 voct 'ic units yield a combination of 168 feature/slot units for 
the left-justified representation and another 46 feature/slot units for the right-justified coda 
for each input word. The input also includes five units dedicated to the five cells in the 
paradigm for English verbs. There is one unit each for the present, the past, the past 
participle, the present participle, and the third person singular present. 

The output of the network is simply the 168 feature/slot units of the left-justified 
phonological form. Between the input and output units there are two pools of 200 
"hidden" units. These units are called "hidden" because they have no direct interpretation 
in either input or output terms. Between adjacent pools, every unit of one pool is 
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connected to every unit of the other pool. The model uses two pools of hidden units, 
because a model which had only one pool of units did not do as well at learning the training 
set. 

The final feature of the network is a set of identity-mapping connections between the left- 
justified input and the output. The network was designed to treat the learning of the 
"derived" forms of the verb as modifications of the phonological form of the "basic" 
present tense. The idep. that the child assumes that the past tense is somehow a 
modification of the presf nt 'iliis is done by including a set of connections that "copy" the 
left-justified phonological form of the input directly onto the output. This copying only 
sets a weak bias on the activation of the output units. This bias can be overcome with 
learning. Indeed, as we will see, the initial bias is usually overcome within a few trials. 



OUTPUT UNITS 



INPUT UNITS 



c 




168 left-justified 46 right-justified 5 forms 



2.4. The learning procedure. 

A major difference between the current model and the R&M model is that the new 
simulation uses the back-propagation learning algorithm (Rumelhart, Hinton, & Williams, 
1986a). This algorithm makes use of pools of hidden units to capture nonlinearities in 
problem spaces. The improved learning of the training set found in the new simulation is 
at least partly attributable to this more powerful learning algorithm. 
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Training consisted of 24,000 runf or epochs, duiing which the network w is inin,.,) ,„ 
produce the correct output for various forms. FoL rte presenied w*ih t"e ic^ 
relative frequencies found in (Francis, & Kucera. 1982). wit^hri^st frenuent fmn 
beu g produced exactly once per epoch, and the rarest fonn about once evei^ Tlw cDo^h 
THIS resulted m the presentation of about 1.3 million forms duiing the couS™ifn^ 

The input co^s was derived from the Francis and Kucera (1982) corous of Ennlish wnH 
frequencies. TTie 6949 most frequent verb forms - including presen? pLt oas^wLlTe 
present participle and third person singular - were fin^^ select? as a base «^ 
simutation. These were derived from 2161 diffe^i v.rh< Hnm«„i,L.t ? ' ^°!^*\' 
forms (e.g. past tense for "spit" can eiUiw "spir S^^^sm S"" """'"P" 

the less cotmnon forms. a1i fonns wSrhad'^So^thrt&UaKore^Z't'^^^^^ 
consonant^honemes in a row, or more than two vocalirphiSes in a row w^^^^ 
remwed. The remainuig 6090 forms, derived from 2062 veVbs, represin^ Te c^ut 

^criSir^Tis^^^ar^ftSsiif 

3. The Results 

wSi^^^^^^^^ - --i^er some general 

JplShs"^' ^^"'^ "^^^ produced as output during the first 40 to 80 

rSTaTfi^a^Si?/m P^"°^ °f reproduction 

exienaea as tar as epoch 200 before the first attempts were made to nroduce 
overregulanzations such as "cutted" or "hiitcd " produce 

TL^r^h ''IIU?);;!!,'''^^^ ^P*"^ ^ P«"«i about epoch 200 to 

about epoch 3000 exploring alternative vowel shifts and modifications 

M /d/Tnd /IdT °^ '^'^''^ explored use of final 

fw?.°^^ °i ^^""^^^ performance was stabilizing by epoch 3000 

'''''' °~ or w cK s 

6. In most cases, the last 10,000 epochs are error free. 

There were three forms that were only learned in the last in nm ennrhc tk-co 
were "bled" at 14780, "brought" at 158 10 KSugSt" at 15^ 
M^^e^even low frequency iiregular forms remained unlearned at the end of 

In the testing done at the end of the simulation, nine of the thirteen untrained na.t 
tense irregulars were missed. Six of these were simply i^stSs othrcTasls^ 

cSey^So^T"'' " ~ "° forest; foThich 

By ^e first test point the regular past tense was already over 99% correct In overall ten^s 
the simulation obviously did veiy well at its task of le'aming the pasue^'se TSVesulis 
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3. 
4. 
5. 

6. 
7. 
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9. 
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are encouraging, but hojv well was the model able to address the 13 specific problems we 
examined earlier? 

3.1. The U-shaped learning problem 

The U-shaped learning for verbs involves four observable components early correct 
usage, subsequent overregularization, coexistence of correct and incorrect fontis, and fmal 
correct usage. The new simuladon correctly models the last three components. 

1 . Overregularization. Usually, overregularization begins shortly after the first SO 
or 100 efiochs. For nearly every one of the irregular verbs, except tor zero-marked 
pasts such as **hit" or "cut," there is a clear early period of oveiregulaiization. This 
IS not an across-the-board phenomenon, since some verbs show much more 
overregularization than others and some show it fo; a much longer period than 
others. 

2. Coexistence of correct and incorrect forms. There is then a period of 
between 300 and 3000 epochs for aknost all irregular verbs where the correct form 
coexists with overregularized forms. For a verb like **bent" this period ends by 
epoch 2800; for a verb like "arise" it extends to epoch 8010. 

3. Final correct usage. For nearly all of the verbs, there then follows a long 
period of final correct usage. It is important to note that, once the networic reaches 
correct performance, deviations from the correct nxxiel arc fairly infirequent. There 
are indeed late occurring errors against a background of correct performance, such 
as the uses of "binded" for "bound" on epochs 1950 and 2300. However, when 
we realize that the learning went on for 24000 epochs, it is clear that much of this 
period was spent in error-free performance for this verb. In this sense, the model 
behaves fairly deterministically, reflecting the detemiinistic nature of the learning 
algorithm. If we were interested in simulating adult speech errors, we would have 
h£d to employ a non-detcrminisdc model. 

However, the simulation fails to capture the founh component of the u-shaped pattern - 
early correct usage for irregular forms. 

3.2. The "ated" problem 

P&P argue that fomis such as "ated" arise when the child confuses "ate" with the present. 
Capturing this type of processing in this network was extremely easy. We simply 
constructed a second set for testing generalization that was composed of four irregular past 
forms. The verbs were "ate," "broke," "ran," and "bought." The results are not 
particularly surprising. The network produced as output the forms "ated," "broked," 
"ranned," and "boughted." This was the case for "ate," "ran," and "brought" at the first 
test point (epoch 4000). "Broke," however, yielded "broke" as its past tense in this 
particular type of generalization testing until the test at 12000 epochs, after which point it 
produced "broked." 

3.3. The "hitted" problem 
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Like the R&M model, the new simulation produced fewer overregularizations for zero- 
marking verb like **cut" and "hit" than for other irregulars. Indeed the zero-marking class 
was learned quite quickly with very few errors. For example, "burst" had only three 
"bursted" errors, and only seven errors in total. The verb "cast" had only one "casted" 
error after epoch 1950; "cost" had none after epoch 1340, and "cost" had none after epoch 
210. Perhaps the most dramatic evidence for the power of what Menn and MacWhinney 
(1984) called the "repeated morpheme constraint" is in the verb "cut" which had only three 
errors in 24000 epochs ~ two of which were uses of the overregularized "cutted" on trials 
170 and 210. When we compare this virtually error-free learning of "cut" with the 
difficulties the network had in learning an alternation such as "deal • dealt," it is clear that 
y/etbs of the "hit" and "cut" type are much easier to learn than other irregulars. 

3.4. The "aigalgal" problem 

The "aigalgal" problem disappears in the new sim> 'ation. The phonological representation 
used in the new simulation guarantees unique representations for different sounding words. 
The left-justified representation of Oykangand "algal" is CCCaVlgCaVlCCVVCCC, 
whereas that of "aigalgal" is CCCa VlgCaVlgCaVlCC. Since these different words have 
different representations, it is clear that the "aigalgal" problem is solved by the new 
simulation. 

3.5. The "slit-siit" problem 

The "slit-silt" problem also disappears witii the new simulation, because phonologically 
similar words now have more similar representations. In the case of "slit" and "silt" the /V 
appears in exactW the same slots for both words. The final /t/ also appears in the same slot 
in die right-justified representation. 

3.6. The "brag-grab" problem 

We ran an auxiliary simulation identical to the basic simulation, but in which the past tense 
was formed simply by reversing the order of segments in the present tense. For example, 
the past tense of "brag" was "grab" and the past tense of "trickle" was "Ikirt." This 
transformation greatiy impaired the performance of the network on the past tense. After 
24000 epochs of training, only 15% of the forms were correct. Apparently, the network 
cannot leam an alternation of this type. 

3.7. The phonological regularities problem 

The new simulation addresses the phonological regularities problem in a variety of ways. 
First, the change to a feature/slot representation improves its ability to model generalization 
across the phonological inventory. To demonstrate this, the simulation was also given the 
verb "bach" /bax/ as a generalization test a^id correctly produced the past tense form 
"bached" /baxt/. Second, by including multiple forms of each verb in training, the 
simulation was able to demonstrate cross-paradigm regularities. For example, the 
progressive of "accompanying" was occasionally formed as "accompanyng." The deletion 
of the initial /V of the progressive -ing suffix appears to be a result of paradigmatic pressure 
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from similar deletion patterns occurring in the other suffixes. Of course, the network 
leans to counteract this pressure, but the important fact is that the new simulation shows 
the presence of such phonological regularities. Other evidence of the impact of 
UTCgulM-^a^^ IS provided by interpcneirations of past paniciple forms into 

3.8. The homophony problem 

A smaU auxiliary simulation was run to demonstrate the ability of the network to acauire 
past tense foiTOS for homophones. The verbs used were "ring," "wring," "iumo " "want " 
and run. The nejyork architecture was supplemented by the following 21 semantic 
features: action, auditory-result, cause-contact, circle, completive, high-pitch, internal- 
state, objcct-gap, object-state, object-thing, positional-change. response, sharp-onset, 
surround, torque, use-of-hands. usc-of-feet, vertical direction, volitional, and 
whole-body-rnotion. The results of this simulation were quite simple. The network 
learned to produce "rang" as the past tense of "ring- 1," "ring^" as the past tense of "ring- 
^ past tense of "wring" within 2400 epochs. These results kidicate 
that nets of this type can readily resolve homophony. "iuiv,aic 

3.9. The convergence problem 

By epoch 4000, all but seven of the regular past tense forms were being produced 
iJi'nS?' °^ past tense forms were correct. By 

l^u 'mTSS; °^ ^® regular past tense forms were correct. By the end of training at 
epw:h 24,000, errors were being made on only 1 1 of the irrregular pasts. Perfect 
performance on the present progressive was achieved by the first check at 4000 epochs. 
J3l^7c» S'^^®"^ perfected by the end of the simulation. Learning of 

the past pamciple followed a pattern similar to that for the past. It is clear that the network 
succeeded at its assigned task of learning the English verb paradigm. If we had allowed the 
^° "J5 ^^''"u ;«i^tional days or given it additional hidden unit leso Tces, we 
probably could have reached complete convergence. 

3.10. The regular pattern problem 

?^l!?rSS?^ ^^^'^ significant error. By r.^n check at 

5^ K^V°ll^y seven of the 1059 reguar pasts were being missed. Thi. ^.n>ppcd to one 

?ifc ^1'*'^^°"°'^"^ ""^^^^^ ^0^' «I^h 16000 performance wSTpcrfcct. 
This type of learning is exactiy what we sec in children. ^ 

P&P suggest that, in some way, connectionist models are misportraying the role of the 
regilar pattern. Before this issue can be seriously examined, P&P will have to show how 
tnis issue can iiave real consequences for computational simulations. For the moment aU 

wL' weTbi?«S chV^^^^^ "'"^ '^"'"^'^^^ ''^"^^ P^"^"^ J"^^ 



3.11. 



The cryptorule problem 
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L&B criticized R&M for biases in the selection and construction of Wickelphones. In 
general, these complaints were reasonable. However, L&B attempted to move from these 
reasonable complaints to the general claim that acceptance of a panicular featural 
representation in a POP network is tantamount to acceptance of a production system 
architecture. There is no reason to accept this linkage, particularly in regard to the current 
simulation. More specifically, there is nothing in the featural representation of the current 
simulation that biases it toward the acquisition of some particular rule. Instead, the 
representation was chosen to be powerful enough to facilitate learning of different types of 
rules in different languages. It is true that the representation is capable of expressing 
something like "final t.'* However, there is nothing wrong with that ability and nothing that 
can be construed as involving either TRIGS or "tricks." 

3.12. The early noise problem 

The new simulation no longer spends its initial epochs gropping toward the ability to 
produce some recognizable form of the verb. From the first trials, it is producing either the 
present tense or some variant of the present tense. In this regard the modd provides a more 
accurate characterizaiton of the way in which the two-year-old child works on this task. 

3.13. The direct access problem 

The new simulation does not solve the direct access problem. 



4. Is there a better symbolic model? 

If there were some other approach that provided an even more accurate characterization of 
the learning process, we might still be forced to reject the connectionist approach, despite 
its successes. The proper way of debating conceptualizations is by contrasting competitive 
implementations. To do this in the present case, we would need a symbolist implementation 
that can be contrasted with die current connectionist implementation. P&P sketch out a 
piece of such an alternative but advise the reader not to take it as a **serious model." 
Unfortunately, the reader has no alternative but to take it seriously. It turns out that the 
model which P&P are proposing as the main symbolist alternative to the R&M model is 
fragment of the model of MacWhinney (1978). We are not arguing that the MacWhinney 
model was ill-constructed and descriptively inadequate. But when it comes to actually 
implementing the account of MacWhinney (1978), a myriad of detailed decisions must be 
made regarding tiie shape of possible input forms, the ways in which rule strengths should 
be incremented, the algorithm for production matching and conflict resolution, and so on. 
The current approach avoids these sd has. modelling decisions. 



5. Summary 

We have examined a series of 13 problems raised by Pinker and Prince (1988) and Lachter 
and Bever (1988) against the verb learning model of Rumelhart and McClelland (1986). 
These problems were presented as fundamental flaws in the conceptualizations underlying 
connectionism that could well call into question any application of connectionist models to 



ERIC 



95 



language processing or learning. To address these problems, we constructed a new 
simulation using a new input representation based on feature/slot units and two views on 
input words, a new architecture using identity mappings and hidden units, a new learning 
algorithm, and an input corpus that included all fise cells of the English verb paradigm 
Together, these changes led to a vast improvement in the performance of the model. All of 
the 13 problems were addressed successfully except for those dealing with the problem of 
direct access within connectionist nets. 

As a result of this work we conclude that connectionist models are indeed extremely useful 
ways of characterizing the learning inflectional systems. We also conclude that the 
critiques of Pinker and Prince and Lachter and Bever erred in confusing conceptualizations 
with implementations. 
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